Skip to content

Add support for combining system:partition and features/extras in valid_systems#3613

Open
casparvl wants to merge 2 commits intoreframe-hpc:developfrom
casparvl:support_combining_syspart_features_valid_systems
Open

Add support for combining system:partition and features/extras in valid_systems#3613
casparvl wants to merge 2 commits intoreframe-hpc:developfrom
casparvl:support_combining_syspart_features_valid_systems

Conversation

@casparvl
Copy link

This was requested in #2820 , although that was eventually closed with a suggestion for a workaround.

However, I hit it again when trying to use reframe.utils.find_modules(substr, ...) in combination with wanting to specify additional features. Example use cases (and there's many more one can come up with):

  • I may have a software X for which want to not run CPU tests on a GPU node. I normally achieve that by assigning the CPU feature to my CPU partitions (but not to my GPU partitions) and then requesting the CPU feature from my CPU-only tests.
  • I may have two variants of a bandwidth test (which needs module Y): one sending small packets, one sending larger packets. I only want to run larger one only on partitions that have the infiniband feature.

In both cases, we want to test all modules with X or Y respectively that are available on our system, hence we want to use the find_modules function. However, the first element of the tuple this returns is the system:partition string for which it found the module matching the requested substring. This is clearly meant to be used in valid_systems directly (as is confirmed by the example in the docs), but for the examples above, you may want to only run on a subset of these system:partition combinations (based on the required features. It would be very natural to then do:

module_info = parameter(find_modules('X'))
...
self.valid_systems - [module_info[0] + ' +feat1']

To make sure that a partition is only valid if it provides the right module and the right feature.

It turned out that the code changes needed are very limited (it seems like a lot because the indentation changed, but the key part is:

  • not splitting the system:partition case from the feature/extra specification case
  • Handling the system:partition case with an extra elif on the subspec:
            elif ':' in subspec and not subspec.startswith(('+', '-', '%')):
                # If there is a system:partition specified, make sure it
                # matches one of the items in valid_matches
                syspart_match = True if subspec in valid_matches else False

And adding that syspart_match to the return logic:

        # If the partition has all the plus features, none of the minus
        # all of the properties and the system:partition spec (if any)
        # matched, this partition is valid
        if (
            have_plus_feats and not have_minus_feats and have_props
            and syspart_match
        ):
            return True

Note that this logic does not assume the system:partition to strictly come before or after the features/extras, so I've adapted the valid syntax specification to allow both:

_VALID_SYS_SYNTAX = rf'^(({_FKV}\s+)*{_S}(\s+{_FKV})*|{_FKV}(\s+{_FKV})*)$'

If anything, I feel the code has become cleaner by not handling system:partition as a special, separate case, but as 'just another item' that may occur as a subspec.

@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.37%. Comparing base (da2fbb3) to head (89259dd).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #3613   +/-   ##
========================================
  Coverage    91.37%   91.37%           
========================================
  Files           62       62           
  Lines        13484    13484           
========================================
  Hits         12321    12321           
  Misses        1163     1163           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@casparvl
Copy link
Author

casparvl commented Jan 21, 2026

I've done some concrete testing to prove this works. Note that I have 4 partitions (rome, genoa, gpu_A100, gpu_H100), of which the first two have the cpu feature assigned. Each partition has two environments: EESSI-2023.06 and EESSI-2025.06.

**Case 1: traditional `system:partition` assignment**
module_info = parameter(find_modules('SciPy-bundle'))
...
        s, e, m = self.module_info
        self.valid_systems = [s]

Output:

[       OK ] (1/8) EESSI_ModuleFinder2 %module_info=('snellius:rome', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') /586a9dfe @snellius:rome+EESSI-2023.06
...
[       OK ] (2/8) EESSI_ModuleFinder2 %module_info=('snellius:rome', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') /fe4be974 @snellius:rome+EESSI-2025.06
...
[       OK ] (3/8) EESSI_ModuleFinder2 %module_info=('snellius:genoa', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') /93bc1dc5 @snellius:genoa+EESSI-2023.06
...
[       OK ] (4/8) EESSI_ModuleFinder2 %module_info=('snellius:genoa', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') /3034e72c @snellius:genoa+EESSI-2025.06
...
[       OK ] (5/8) EESSI_ModuleFinder2 %module_info=('snellius:gpu_A100', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') /80754277 @snellius:gpu_A100+EESSI-2023.06
...
[       OK ] (6/8) EESSI_ModuleFinder2 %module_info=('snellius:gpu_A100', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') /8f6a7e95 @snellius:gpu_A100+EESSI-2025.06
...
[       OK ] (7/8) EESSI_ModuleFinder2 %module_info=('snellius:gpu_H100', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') /e6d136b9 @snellius:gpu_H100+EESSI-2023.06
...
[       OK ] (8/8) EESSI_ModuleFinder2 %module_info=('snellius:gpu_H100', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') /bf9cd015 @snellius:gpu_H100+EESSI-2025.06
...
[----------] all spawned checks have finished
**Case 2: Combining system:partition with +feat1**
module_info = parameter(find_modules('SciPy-bundle'))
...
        s, e, m = self.module_info
        self.valid_systems = [s + ' +cpu']

Output:

[       OK ] (1/4) EESSI_ModuleFinder2 %module_info=('snellius:rome', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') %scale=1_core /586a9dfe @snellius:rome+EESSI-2023.06
...
[       OK ] (2/4) EESSI_ModuleFinder2 %module_info=('snellius:rome', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') %scale=1_core /fe4be974 @snellius:rome+EESSI-2025.06
...
[       OK ] (3/4) EESSI_ModuleFinder2 %module_info=('snellius:genoa', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') %scale=1_core /93bc1dc5 @snellius:genoa+EESSI-2023.06
...
[       OK ] (4/4) EESSI_ModuleFinder2 %module_info=('snellius:genoa', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') %scale=1_core /3034e72c @snellius:genoa+EESSI-2025.06
...
[----------] all spawned checks have finished
**Case 3: Combining system:partition with -feat1**
module_info = parameter(find_modules('SciPy-bundle'))
...
        s, e, m = self.module_info
        self.valid_systems = [s + ' +cpu']

Output:

[       OK ] (1/4) EESSI_ModuleFinder2 %module_info=('snellius:gpu_A100', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') %scale=1_core /80754277 @snellius:gpu_A100+EESSI-2023.06
...
[       OK ] (2/4) EESSI_ModuleFinder2 %module_info=('snellius:gpu_A100', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') %scale=1_core /8f6a7e95 @snellius:gpu_A100+EESSI-2025.06
...
[       OK ] (3/4) EESSI_ModuleFinder2 %module_info=('snellius:gpu_H100', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') %scale=1_core /e6d136b9 @snellius:gpu_H100+EESSI-2023.06
...
[       OK ] (4/4) EESSI_ModuleFinder2 %module_info=('snellius:gpu_H100', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') %scale=1_core /bf9cd015 @snellius:gpu_H100+EESSI-2025.06
...
[----------] all spawned checks have finished
**Case 4: Using only features**

This case clearly doesn't make any sense, because it'll create tests where the first element of module_info is X and then the partition that will be used is Y, but just to prove that a valid_systems with only features still works:

module_info = parameter(find_modules('SciPy-bundle'))
...
        s, e, m = self.module_info
        self.valid_systems = ['+cpu']

Output:

[       OK ] ( 1/16) EESSI_ModuleFinder2 %module_info=('snellius:rome', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') %scale=1_core /586a9dfe @snellius:rome+EESSI-2023.06
...
[       OK ] ( 2/16) EESSI_ModuleFinder2 %module_info=('snellius:rome', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') %scale=1_core /586a9dfe @snellius:genoa+EESSI-2023.06
...
[       OK ] ( 3/16) EESSI_ModuleFinder2 %module_info=('snellius:rome', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') %scale=1_core /fe4be974 @snellius:rome+EESSI-2025.06
...
[       OK ] ( 4/16) EESSI_ModuleFinder2 %module_info=('snellius:rome', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') %scale=1_core /fe4be974 @snellius:genoa+EESSI-2025.06
...
[       OK ] ( 5/16) EESSI_ModuleFinder2 %module_info=('snellius:genoa', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') %scale=1_core /93bc1dc5 @snellius:rome+EESSI-2023.06
...
[       OK ] ( 6/16) EESSI_ModuleFinder2 %module_info=('snellius:genoa', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') %scale=1_core /93bc1dc5 @snellius:genoa+EESSI-2023.06
...
[       OK ] ( 7/16) EESSI_ModuleFinder2 %module_info=('snellius:genoa', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') %scale=1_core /3034e72c @snellius:rome+EESSI-2025.06
...
[       OK ] ( 8/16) EESSI_ModuleFinder2 %module_info=('snellius:genoa', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') %scale=1_core /3034e72c @snellius:genoa+EESSI-2025.06
...
[       OK ] ( 9/16) EESSI_ModuleFinder2 %module_info=('snellius:gpu_A100', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') %scale=1_core /80754277 @snellius:rome+EESSI-2023.06
...
[       OK ] (10/16) EESSI_ModuleFinder2 %module_info=('snellius:gpu_A100', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') %scale=1_core /80754277 @snellius:genoa+EESSI-2023.06
...
[       OK ] (11/16) EESSI_ModuleFinder2 %module_info=('snellius:gpu_A100', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') %scale=1_core /8f6a7e95 @snellius:rome+EESSI-2025.06
...
[       OK ] (12/16) EESSI_ModuleFinder2 %module_info=('snellius:gpu_A100', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') %scale=1_core /8f6a7e95 @snellius:genoa+EESSI-2025.06
...
[       OK ] (13/16) EESSI_ModuleFinder2 %module_info=('snellius:gpu_H100', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') %scale=1_core /e6d136b9 @snellius:rome+EESSI-2023.06
...
[       OK ] (14/16) EESSI_ModuleFinder2 %module_info=('snellius:gpu_H100', 'EESSI-2023.06', 'SciPy-bundle/2023.11-gfbf-2023b') %scale=1_core /e6d136b9 @snellius:genoa+EESSI-2023.06
...
[       OK ] (15/16) EESSI_ModuleFinder2 %module_info=('snellius:gpu_H100', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') %scale=1_core /bf9cd015 @snellius:rome+EESSI-2025.06
...
[       OK ] (16/16) EESSI_ModuleFinder2 %module_info=('snellius:gpu_H100', 'EESSI-2025.06', 'SciPy-bundle/2025.06-gfbf-2025a') %scale=1_core /bf9cd015 @snellius:genoa+EESSI-2025.06
...
[----------] all spawned checks have finished

@casparvl
Copy link
Author

Hm, I have to recheck how the current if...elif...elif handles sysname * and *:partname cases. My bet is, it handles them incorrectly.


_S = rf'({_NW}(:{_NW})?)' # system/partition
_VALID_SYS_SYNTAX = rf'^({_S}|{_FKV}(\s+{_FKV})*)$'
_VALID_SYS_SYNTAX = rf'^(({_FKV}\s+)*{_S}(\s+{_FKV})*|{_FKV}(\s+{_FKV})*)$'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR would benefit from a rewrite of this line.
The system definition should not rely on spaces.
All of these definitions should be valid

1. *
2. daint:gpu
3. daint:mc
4. daint:mc +gpu
5. daint:mc +gpu +uenv
6. daint:mc+gpu
7. daint:mc+gpu +uenv
8. daint:mc +gpu+uenv
9. daint:mc +gpu +uenv+modules
10. daint:mc +gpu+uenv +modules

In the current form only the first four cases are valid.
The fifth case is miss interpreted, where one cannot see the +gpu entry.
And the last five case are not valid systems.

This implies that:

  1. Tests won't run because of a space, without any hint. Users may lose a lot of time because of a missing space.
  2. Potential introduction of silent bugs. The addition of a new feature makes the previous invalid.
  3. Hinders the ability to write tests with partitions that have multiple features.

We should make this feature more robust to the end user and to the future.

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The system definition should not rely on spaces.

Just to be clear: unless I'm mistaken, the current version of ReFrame also requires spaces between features, right?

All of these definitions should be valid

And same here: definitions 6-10 are not valid in the current reframe either?

Don't get me wrong: I'm not against allowing for non-space separated features. But I'm also not sure it should be handled in this PR: allowing non-space seperated features is orthogonal to being able to combine the sysname:partname syntax with the +feat syntax. And I wouldn't want this PR to be delayed because of discussions on whether or not non-space separated features are desirable.

The fifth case is miss interpreted, where one cannot see the +gpu entry.

That's strange, so you're saying that if you define:

daint:mc +gpu +uenv

You're effectively getting

daint:mc +uenv

?

I haven't observed that, but I might have not tested this particular case. I'll try it out...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot reproduce this. I have two partitions, A and B. A has feature foo and B has features foo and bar. If I create a test and set

self.valid_systems = ['sysname:* +bar +foo'

your issue was that bar would be ignored. However, I tested this, and it (correctly) only schedules a test for partition B, which offers bar.

@vkarak
Copy link
Contributor

vkarak commented Feb 26, 2026

@casparvl I see the problem you are facing, but I don't agree with the solution conceptually. Allowing a syntax like valid_systems = ['system:partition +foo'] is contradictory, because we ask for pinning to a specific system/partition combination, and then we actually don't mean it. I would see as valid this valid_systems = ['system +foo'], which it would essentially select all partitions with foo from system, but I think that's superfluous and equivalent to valid_systems = ['+foo'], since the system is given once the test is loaded.

The problem is more fundamental. It is the fact that find_modules() does not understand the features and extras. It should instead take the valid_systems as an argument and then return the system/partition combination that respects both the given constraints and provides the given module.

This pattern is not unique to find_modules() but to any user function that needs to parameterize over the system/partitions/environment combination, and I believe this must be made easier by the framework. Check also the discussion in #3323, which is related.

@casparvl
Copy link
Author

casparvl commented Mar 10, 2026

@vkarak Sorry for the slow response, I was on holiday.

Allowing a syntax like valid_systems = ['system:partition +foo'] is contradictory, because we ask for pinning to a specific system/partition combination, and then we actually don't mean it.

Hm, coming from the more traditional syntax before features/extras where a thing, I see your point. But your solution makes the pretty hard assumption that the valid_systems is known and fixed at the moment find_modules is called (typically in the class body, since you typically parameterize over the modules). That's not the case for us. For our particular case, some of our features get added to valid_systems as part of a run_after('init') hook. In fact: some depend on what find_modules returns (e.g. if a module contains -CUDA, we add a feature +gpu). That means that when find_modules is called (in the class body), the valid_systems doesn't have a final value yet.

In this context, having both a system:partition specification and a +feat specification is the easiest, most pragmatic way to get the right filtering - even if it may feel like an overspecified thing initially.

I'm not saying there aren't any other options: if we have access to configuration objects in the after-init hook (as discussed in #3323 ), I guess we could take the system:partition returned by find_modules, determine if that has the required +gpu feature, and if not just set valid_systems=[] or something (again: all in an after-init hook). In that case, we (at least for our particular use case) wouldn't even need a find_modules that is valid_systems-aware - we'd just fix it all based on what find_modules returns in terms of system-partition combinations.

But: such an approach does feel very clunky, as we're bypassing the whole feature mechanism. Compared to that, I'd find the system:partition +foo specification much more elegant.

@smoors
Copy link
Collaborator

smoors commented Mar 11, 2026

@casparvl I see the problem you are facing, but I don't agree with the solution conceptually. Allowing a syntax like valid_systems = ['system:partition +foo'] is contradictory, because we ask for pinning to a specific system/partition combination, and then we actually don't mean it.

to be blunt, i don't see what's contradictory here. valid_systems = ['system:partition +foo'] does not just ask for a specific system/partition combination, it only asks for that specific system/partition combination if it also has +foo.

i understand why it looks strange to you, but this PR solves our problem in a relatively simple and generic way. is there any technical reason why this can't be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

4 participants